Gang Wu, Tanya Leise, and John Hogenesch
2016-05-20
The MetaCycle package is used to detect rhythmic signals in large scale time-series data. It uses three algorithms, ARSER (ARS), JTK_CYCLE (JTK), and Lomb-Scargle (LS), to detect periodic signals and can output integrated analysis results.
Usually, circadian time-series data is evenly sampled and the interval between time points is an integer (e.g. 4 hrs). However, sometimes there are replicate samples, missing values or un-evenly sampled series, or sampling with a non-integer interval. Examples of these types of data are shown in this table.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| The usual data | CT0 | CT4 | CT8 | CT12 | CT16 | CT20 |
| With missing value | CT0 | NA | CT8 | CT12 | CT16 | CT20 |
| With replicates | CT0 | CT0 | CT8 | CT8 | CT16 | CT16 |
| With un-even interval | CT0 | CT2 | CT8 | CT10 | CT16 | CT20 |
| With non-integer interval | CT0 | CT4.5 | CT9 | CT13.5 | CT18 | CT22.5 |
Some datasets can have one or more of these conditions, e.g.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| With replicates and missing value | CT0 | CT0 | CT8 | NA | CT16 | CT16 |
| With un-even interval and replicates | CT0 | CT2 | CT2 | CT10 | CT16 | CT20 |
The meta2d function in MetaCycle is designed to analyze different types of time-series datasets and can automatically select the proper methods to analyze different experimental designs. The implementation strategy used for meta2d is shown in the flow chart.
In addition to selecting the proper method(s) to analyze different experimental designs, meta2d can also integrate results from multiple methods. A detailed explanation of integration strategies is in the vignettes of MetaCycle.
meta2d calculates the amplitude with following model: \[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]
The baseline and trend are explained below.
| Column name | Description | Column name | Description |
|---|---|---|---|
| ARS_pvalue | p-value from ARS | LS_BH.Q | FDR from LS |
| ARS_BH.Q | FDR from ARS | LS_period | period from LS |
| ARS_period | period from ARS | LS_adjphase | adjusted phase from LS |
| ARS_adjphase | adjusted phase from ARS | LS_amplitude | amplitude from LS |
| ARS_amplitude | amplitude from ARS | meta2d_pvalue | integrated pvalue |
| JTK_pvalue | p-value from JTK | meta2d_BH.Q | FDR based on integrated pvalue |
| JTK_BH.Q | FDR from JTK | meta2d_period | averaged period |
| JTK_period | period from JTK | meta2d_phase | integrated phase |
| JTK_adjphase | adjusted phase from JTK | meta2d_Base | baseline value given by meta2d |
| JTK_amplitude | amplitude from JTK | meta2d_AMP | amplitude given by meta2d |
| LS_pvalue | p-value from LS | meta2d_rAMP | relative amplitude |
# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")# change working directory to the desktop
setwd("~/Desktop")
# check required directories under the desktop
dir()# load shiny package
library("shiny")
# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")# load shiny package
library("shiny")
# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")# copy the 'meta2d_experimentA.csv' file from Desktop to 'result' directory in SRBR_SMTSAworkshop-master
file.copy(from = "~/Desktop/meta2d_experimentA.csv",
to = "~/Desktop/SRBR_SMTSAworkshop-master/meta2d_experimentA.csv", overwrite = TRUE)# load dplyr package
library("dplyr")
# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv",
stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)## [1] 706
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns for
# drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase, num_range("X", seq(18, 64, by=2), width = 2))# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)
heatmapFigure# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")
histFigure# take a look at column names of 'cirD' dataframe
colnames(cirD)## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# if it reports error after typing above command, please re-run the code of
# preparing 'figureD' in the first part of 3.6 section of this demo
# select its 'CycID', "meta2d_BH.Q", 'meta2d_phase' columns for later analysis
phaseD <- select(cirD, CycID, meta2d_BH.Q, meta2d_phase)
# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt", stringsAsFactors = FALSE)
# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)## CycID meta2d_BH.Q meta2d_phase ENTREZID SYMBOL
## 1 1416273_at 6.267205e-04 0.9112647 21928 Tnfaip2
## 2 1418853_at 1.005625e-04 2.2190313 28194 Apon
## 3 1418322_at 1.267709e-03 21.7744648 12916 Crem
## 4 1435748_at 5.230269e-08 16.5923801 14544 Gda
## 5 1448993_at 3.079038e-02 16.0442955 67841 Atg3
## 6 1428889_at 4.627240e-02 4.8168048 69113 Alkbh3
# filter out those probesets without annotation information
phaseD <- filter(phaseD, SYMBOL != "NA")
# select 'SYMBOL', 'meta2d_BH.Q' and 'meta2d_phase' columns for getting a dataframe without duplicate gene names
ori_phaseD <- select(phaseD, SYMBOL, meta2d_BH.Q, meta2d_phase)
# take a look at the row number with possilbe duplicate gene names
dim(ori_phaseD)## [1] 698 3
# load the 'uniF' function from 'fig.R' for doing this work
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get a dataframe without duplicate gene names
uni_phaseD <- uniF(ori_phaseD)
# take a look at the rownumber now
dim(uni_phaseD)## [1] 643 3
# select 'SYMBOL' and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(uni_phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)## SYMBOL meta2d_phase
## 1 1110059E24Rik 15.0817948
## 2 1190002N15Rik 15.7701032
## 3 1700023H06Rik 6.0991388
## 4 1810055G02Rik 11.2992141
## 5 2310035C23Rik 1.6562627
## 6 2610507B11Rik 0.8431068
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt",
sep = "\t", quote = FALSE, row.names = FALSE)# read in the 'experimentPSEA.txt' file
ori_pseaD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt",
stringsAsFactors = FALSE)
# take a look at the data
head(ori_pseaD)## SYMBOL meta2d_phase
## 1 1110059E24Rik 15.0817948
## 2 1190002N15Rik 15.7701032
## 3 1700023H06Rik 6.0991388
## 4 1810055G02Rik 11.2992141
## 5 2310035C23Rik 1.6562627
## 6 2610507B11Rik 0.8431068
# read in the 'MouseHumanHomolog.txt' file in 'data-raw' directory of SRBR_SMTSAworkshop-master
homoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/MouseHumanHomolog.txt",
stringsAsFactors = FALSE)
# take a look at the data
head(homoD)## Mouse_gene Human_gene
## 1 Acadm ACADM
## 2 Acadvl ACADVL
## 3 Acat1 ACAT1
## 4 Acvr1 ACVR1
## 5 Sgca SGCA
## 6 Adsl ADSL
# join mouse gene and human homolog gene with 'inner_join' function
homo_pseaD <- inner_join(ori_pseaD, homoD, by=c("SYMBOL" = "Mouse_gene"))
# take a look at the joined dataframe
head(homo_pseaD)## SYMBOL meta2d_phase Human_gene
## 1 1110059E24Rik 15.0817948 C9orf85
## 2 1190002N15Rik 15.7701032 C3orf58
## 3 1810055G02Rik 11.2992141 C11orf24
## 4 2310035C23Rik 1.6562627 KIAA1468
## 5 2610507B11Rik 0.8431068 KIAA0100
## 6 2700094K13Rik 9.9926117 C11orf31
# select "Human_gene" and "meta2d_phase" columns for PSEA analysis
outD <- select(homo_pseaD, Human_gene, meta2d_phase)
# take a look at the selected data
head(outD)## Human_gene meta2d_phase
## 1 C9orf85 15.0817948
## 2 C3orf58 15.7701032
## 3 C11orf24 11.2992141
## 4 KIAA1468 1.6562627
## 5 KIAA0100 0.8431068
## 6 C11orf31 9.9926117
# write it to a file named-'human_experimentPSEA.txt' in 'result' directory of SRBR_SMTSAworkshop-master
write.table(outD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/human_experimentPSEA.txt",
sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)